Mobilise-D insights to estimate real-world walking speed in multiple conditions with a wearable device

This study aimed to validate a wearable device’s walking speed estimation pipeline, considering complexity, speed, and walking bout duration. The goal was to provide recommendations on the use of wearable devices for real-world mobility analysis. Participants with Parkinson’s Disease, Multiple Sclerosis, Proximal Femoral Fracture, Chronic Obstructive Pulmonary Disease, Congestive Heart Failure, and healthy older adults (n = 97) were monitored in the laboratory and the real-world (2.5 h), using a lower back wearable device. Two walking speed estimation pipelines were validated across 4408/1298 (2.5 h/laboratory) detected walking bouts, compared to 4620/1365 bouts detected by a multi-sensor reference system. In the laboratory, the mean absolute error (MAE) and mean relative error (MRE) for walking speed estimation ranged from 0.06 to 0.12 m/s and − 2.1 to 14.4%, with ICCs (Intraclass correlation coefficients) between good (0.79) and excellent (0.91). Real-world MAE ranged from 0.09 to 0.13, MARE from 1.3 to 22.7%, with ICCs indicating moderate (0.57) to good (0.88) agreement. Lower errors were observed for cohorts without major gait impairments, less complex tasks, and longer walking bouts. The analytical pipelines demonstrated moderate to good accuracy in estimating walking speed. Accuracy depended on confounding factors, emphasizing the need for robust technical validation before clinical application. Trial registration: ISRCTN – 12246987.


True Positive Evaluation
In the laboratory assessments a total of 1365 WBs were detected by the reference system and 1298 WBs by the wearable device.To be able to compare DMOs on a WB level, the analysis included WBs that were concurrently detected by both systems (true positive analysis).All WBs with a time-overlap of more than 80% of their duration were considered true positive, resulting in 692 WBs that were considered for analysis and considered a TP (see Methods and Supplementary Figs. 1 and 2 for more details).Based on these true positive WBs, we observed a mean error of 0.01 m/s (MRE = 5.9%), and MAE of 0.10 m/s (MARE = 14.96%) across all cohorts (Fig. 2, left.Table 2).We found that walking speed was estimated with good reliability (ICC = 0.84) by the wearable device, with a slight overestimation compared to the INDIP reference system.
In the 2.5-h real-world assessment, the reference system detected 4409 WBs, while the wearable device identified 4620 WBs.The average sensitivity and specificity for WB detection compared to the reference system were 0.65 and 0.99, respectively (Table 3).Across all detected WBs, 1414 (30% of all WBs) were identified as true positive WBs (i.e., more than 80% overlap with a reference WB).Based on these WBs, we observed a mean error of 0.06 m/s (MRE = 14.48%) and a MAE of 0.11 m/s (MARE = 20.31%)across all cohorts (Fig. 2, right, Table 2).As observed in the laboratory data, results showed a good reliability (ICC = 0.77) (Table 4), with an overestimation of walking speed by the wearable device (0.01 m/s).

Combined evaluation
To remove potential bias by focusing only on the true-positive WBs and to mimic actual use of wearable device where reference data may not be available, we performed a second evaluation for which we combined all WBs for a Laboratory test and 2.5 h recording in the real world by taking the median of the calculated DMOs (see Methods).These combined values were then compared between the systems.Results from laboratory data showed a mean error over all tests of 0.01 m/s (MRE = 7.47%) and a MAE of 0.12 m/s (MARE = 17.82%) (Fig. 3, left.Table 4).In contrast, in the real-world we observed a higher mean error over all participants of 0.11 m/s (MRE = 24.48%)and a MAE of 0.13 m/s (MARE = 26.47%)(Fig. 3, right.Table 4).For both environments the errors were higher than those estimated from the analysis on true-positive WBs.The biggest effect was seen on the ICC during the real-world recording which dropped considerably to 0.33 (poor) across all cohorts and www.nature.com/scientificreports/showed no (PFF: 0.04) or even negative correlation (MS: − 0.15).This is not surprising due to the limited number of datapoints included for this type of analysis (one datapoint per participant).www.nature.com/scientificreports/

Factors that can influence walking speed validity
Influence of the cohort.The MAE based on the true-positive evaluation differed by < 0.05 m/s between cohorts in both laboratory and real-world settings.In the laboratory, the COPD cohort had the lowest MAE (0.06 m/s) followed by HAs (0.08 m/s) (Table 2), whereas the PFF and CHF cohorts had the largest MAE of 0.12 m/s.In the real-world, HAs presented the lowest MAE (0.09 m/s) followed by the PFF cohort (MAE = 0.11 m/s) (Table 2).www.nature.com/scientificreports/Walking speed tended to be overestimated for all cohorts apart from CHF, for which walking speed was underestimated by 0.06 m/s in the laboratory and 0.04 m/s in the real-world.
Influence of WB duration and walking speed.In the analysis based on the true positive WBs, errors decreased for longer WB durations (Fig. 4).MAE across all cohorts for very short WBs < 10 s ranged between 0.09 and 0.16 m/s, compared to 0.06-0.11m/s for long WBs (between 60 and 120 s).However, as WB duration increased, the number of available WBs included in the validation analysis decreased as well.When looking at the combined approach (across all detected WBs) (Fig. 5) the trends from the true-positive analysis were confirmed.The MAE for the very short WBs (< 10 s) was lower than for the short WBs (10-30 s), and the number of very short WBs detected by the wearable device was disproportionally higher compared to the reference system.Overall, about two thirds of all WBs were shorter than 30 s.When removing very short WBs (< 10 s) from the calculation of the mean/median errors, the range in error was marginally smaller for some cohorts than the error observed at all WBs for the true positive analysis (improvement < 0.1 m/s).The median of the absolute difference of the combined analysis increased from 0.1 m/s over all WBs to 0.14 m/s for the WBs longer than 10 s.
In both environments, a clear linear negative relationship between the magnitude of the reference walking speed and the measurement errors was observed (Fig. 2).For the slowest WBs (< 0.6 m/s), we observed the largest absolute errors, increasing to 0.8 m/s in real-worlds WBs.Walking speed tended to be overestimated for slow WBs and underestimated for fast WBs.This trend can also be observed in the overall speed distribution of the WBs (Supplementary Fig. 2), which shows a larger number of slow walking bouts and a lower median gait velocity for the reference system compared to the wearable device.Table 3.The performance of the WB detection calculated by comparing, sample by sample, the detected walking bout regions by the single wearable device with the detected walking bout regions by the reference system in the real-world recordings.Performance values are first calculated per participant and then aggregated per cohort, over all participants.Results are provided as mean and confidence intervals.Influence of task complexity.As task complexity increased, so did the MAE.For instance, the most complex laboratory gait task ("simulated daily activities") presented the highest MAE across all cohorts (0.17 m/s) and the least complex task (the slow straight walking test) presented the lowest MAE (0.08 m/s) (Table 5, Fig. 6).Furthermore, the influence of task complexity was cohort dependent.The largest differences between the simple and complex gait tests were observed for the MS, PD, and PFF cohorts (P1 pipeline).In the real-world, for the same cohorts, differences were observed in the errors estimated between the WBs without turns and WBs with turns.For all cohorts, mean error and MAE from real-world assessments were comparable or slightly lower than from simulated daily activities.

Discussion
To our knowledge, this study is the most extensive validation of a complex comprehensive multi-stage analytical pipeline for estimation of walking speed from a single wearable device.Overall, our findings showed good to excellent validation results in the laboratory and moderate to good agreement in the real-world.We demonstrated that validity of walking speed estimation is slightly impacted by several factors including environment (laboratory vs real-world), clinical cohort, gait task complexity and other confounding factors (number of turns, WB duration, WB speed).Our results have strong implications for future research, below we provide our recommendations for future validations and on the use of wearable device-based walking speed in daily life and more broadly, DMOs in general.

Overall validation results
Overall, laboratory walking speed demonstrated excellent agreement with the reference system, with the ICCs of the true positive WBs ranging from good (0.79) to excellent (0.91) and MAEs ranging from 0.06 to 0.12 m/s across all cohorts.Within the combined evaluation, the ICC of walking speed was slightly lower (0.72-0.82), indicating that only a small difference was introduced by the true positive evaluation.Previous studies conducted across various HAs and various clinical cohorts in laboratory settings have shown lower or comparable results 27,[32][33][34] .However, in comparison to those studies, the pipelines in this study were validated over a wider variety of more complex gait tasks, challenging the estimation of walking speed as the signals are more variable and less cyclic, in comparison to steady-state and straight path gait.Estimating walking speed in real-world gait assessment poses challenges due to the complexity and nonstandardized nature of environments.This difficulty is supported by previous literature, which has found that real-world assessments present a greater challenge for DMO estimation 35,36 .Despite these challenges, we achieved good results, since agreement was found to be moderate to good (ICCs within true positive WBs ranging between 0.57 and 0.88) and MAE ranged from 0.09 to 0.13 m/s.As regards to the combined real-world WB analysis, the ICCs were lower than the ICCs from true positive WBs.The MAE remained within usable ranges (< = 0.18 m/s), but MARE increased up to 44% primarily due to large relative errors for low gait speeds.In the combined analysis, median average walking speeds for each participant was calculated, which may have increased the impact of individual datapoints with larger errors, as there was only  www.nature.com/scientificreports/one data point per participant.In some instances, we also observed negative ICCs (MS cohort = − 0.15), which indicates a very poor correlation.Furthermore, this analysis reduced the range in walking speeds, where a larger number of slow WBs were included, which further increased the estimation error.
The WB detection results further show that, dependent upon the cohort, the detected WBs on average only cover between 57 and 72% of the overall walking present in the data.As the pipeline is tuned to provide high specificity, this relatively low sensitivity is expected and this difference in the underlying data distribution partially explained the increased error values for the combined analysis.This demonstrated the bias introduced in the true-positive analysis.Furthermore, the combined approach aggregates real-world data into singular values, which does not reflect the entire distribution of walking speed 37 .
Comparing to one of the few other studies that performed a real-world validation of walking speed, the work by Soltani et al. 38 validated an algorithm based on a single wrist-worn sensor against a head-mounted Global Navigation Satellite System device, finding low bias [interquartile range (IQR) = − 0.01, 0.00 m/s] and an accuracy expressed by root mean square error [IQR = 0.04, 0.06 m/s].However, this validation was only performed in 30 HAs (mean age = 37 years) 38 .Given the promising results we report for estimation of walking speed, and the results provided for the individual algorithmic blocks previously reported 26 , we demonstrate that it is possible to use a single wearable device on the lower back for accurate quantification of mobility.However, it must be considered that the performance of algorithms and pipelines are dependent upon a variety of factors that should be taken into consideration during study design, future validation, and data interpretation.www.nature.com/scientificreports/

Recommendation for real-world DMO validation
Validation protocol Across all cohorts we observed larger absolute errors and lower ICCs with walking speed estimated from the real-world in comparison to laboratory assessment, showing the importance of real-world validations to obtain realistic and ecologically valid error estimates of DMOs.Despite this, our results also show that some real-world challenges can be replicated within laboratory settings, as the errors observed during the simulated-daily activities in the laboratory were in fact higher than in the real-world.In these tasks, participants undertook short WBs containing turns, changes of direction and transitions.Scott et al. 39 compared the walking speed ranges recorded from the laboratory and 2.5 h protocol that were adopted in the present study and found a diverse profile of walking speed ranges in the laboratory that was representative of the walking speed range observed in the real-world.Future validation studies should take into account an adequate balance between challenging tasks (short WBs, turns and transitions) and long uninterrupted walks in the laboratory protocol to properly replicate the expected error ranges from the real-world.
In general, the expected error ranges were dependent upon task complexity.Most condition specific differences were only prevalent in the real-world, which is consistent with previous research reported in HAs and people with MS and PD 13,18,40 .Our findings motivate the inclusion of complex tasks and simulated daily activities into any future laboratory validation.However, we also recommend inclusion of real-world measurements to capture the true range of gait task complexity performed in daily life as well as a myriad of contextual factors, including the distribution of WB duration and walking speed.

Reference system
Utilization of the INDIP system as a reference during both the laboratory-based and real-world protocol proved to be successful in overcoming limitations in accuracy, battery life and usability, all of which are common restrictions of real-world reference systems previously adopted in the literature (e.g., wearable camera and GNSS (global navigation satellite system) 38,41 ).Specifically, the INDIP system has been validated, showing excellent agreement (ICC > 0.95) and very low MAEs (simulated daily activities = ≤ 0.05 m/s) against a stereophotogrammetric system in the same cohorts and laboratory protocol as in the present study 31 .The INDIP system was designed to enable the detection of gait and calculation of parameters based on as few assumptions as possible, particularly concerning the type of walking and the walking environment.Gait event detection relies on pressure insoles that are expected to work independently of the setting, and spatial parameter estimation is based purely on physicsbased integration methods that estimate the 3D trajectory of the foot.The INDIP's performance was evaluated based on a complex experimental protocol specifically designed for mobility assessment.Experiments included selected cohorts of participants with various conditions affecting gait characteristics, performing a complex battery of motor tests designed to produce a heterogeneous and broad range of gait patterns.Results showed overall good/excellent reliability and high repeatability and accuracy for the DMOs analyzed across populations, walking speeds, and WBs.Therefore, the INDIP system is a valuable candidate to collect reference standard data for the analysis of gait in real-world conditions 42,43 .Other existing technologies can be used for obtaining reference data "out-of-the laboratory" (e.g., cameras, markerless systems), but they have intrinsic limitations that make their use inefficient (time consuming data analysis or small volume of data capture), less accurate for stride-by-stride description or not robust to quantify specific gait outcomes (e.g., spatial outcomes).www.nature.com/scientificreports/ The INDIP system can be used in both laboratory and real-world settings to enable a concurrent validation of walking speed measurements, as provided in the present study.The recording duration of 2.5 h with the INDIP system enabled recording of a wide range of activities and walking speeds.

Data analysis
We adopted two approaches to analyzing walking speed, (i) only considering WBs that were directly matched between the wearable device and reference system (true positive evaluation) and (ii) considering the median value of walking speed across all available data (combined analysis).
The true positive evaluation allowed comparisons to be performed with high granularity on a WB level, allowing better understanding of the circumstances under which the wearable device performs best.However, for the true positive analysis we observed bias with regards to the overall walking speed ranges (Supplementary Figs. 1  and 2), resulting in a non-negligible impact on the real-world results.Therefore, the combined analysis is required to confirm observed error ranges and differences in the results of the two approaches should be considered and discussed.Furthermore, this type of analysis introduced the true-positive threshold as a parameter that influences the results.While we could not find a relevant effect of the selection of this parameter value on the walking speed error (Supplementary Figs. 1 and 2), this might influence other DMOs.As our results indicate, no single type of analysis can provide a definite and full picture of the error ranges.Given the lack of other established approaches to perform real-world comparisons of DMOs with a high granularity, we suggest our framework as a basis for future DMO validation studies.

Practical recommendations for the use of wearable devices for real-world walking speed measurements
Our results demonstrate that walking speed can be estimated accurately and reliably across a range of environments, cohorts, tasks and contextual factors.Based upon our promising validation results, below we provide our recommendations on the use of wearable device for real-world walking speed measurement.

Influence of pipeline
For improved understanding of the error, the impact on individual DMOs within the pipeline should be considered.In our case, given the complexity of the respective algorithms, stride length is expected to have a larger contribution to the observed walking speed error compared to cadence (Supplementary Tables 1 and 2) 26 .This motivates further research in more robust methods for spatial parameter estimation.Furthermore, the wearable device seems to record more shorter WBs than the reference, suggesting that longer continuous bouts of walking were split into multiple shorter WBs.Based on specific investigations of such cases, this was often due to limitations of the initial contact and left-right detection.Under challenging conditions (e.g., turns or stairs), these algorithms could not provide reliable stride information leading to a separation of longer periods of walking into multiple WBs, as no valid stride was detected for multiple seconds.This could be the result of the wearable device being positioned on the lower back, where the reference system was also comprised of feet sensors, thus being more robust to quantify gait events across longer periods.The full pipeline is implemented separately for each system, so the combined estimates of all DMOs needed to meet criteria of a WB leads to heterogeneity between systems.This motivates further research in the detection of initial contacts and their laterality under challenging real-world conditions.

Walking bout duration
Real-world walking speed encapsulates a rich dataset of mobility that has been undertaken across various WBs which differ in their length, duration, and context.Each WB reflects a different profile of walking in terms of the number of turns, transitions, and periods of straight walking, which influences walking speed measurement.Therefore, it is not surprising that walking speed estimations were influenced by the WB duration, where very short WBs (< 10 s) presented the largest error.Additionally, the wearable device tends to detect a larger number of shorter WBs than the reference system but fewer medium WBs with intermediate durations (Fig. 5).This suggests that the wearable device tends to fragment gait sequences into smaller segments, possibly attributable to mis-detected initial contacts.We speculate that short WBs predominantly took place within confined indoor spaces such as the home environment.While walking speed captured at this short duration does not reflect steady state gait activities, it could still hold valuable information about balance and functional status (e.g., postural transition, weight-shift, sit-to-stand 44 .Algorithms optimized for straight walking in controlled settings had an increased likelihood of higher absolute errors at very short durations.Based on this, we would recommend using a lower cut-off (WBs > 10 s), to trade-off between the number of removed WBs and still including a minimum threshold of 401 WBs, needed to ensure reliability and validity for real-world gait monitoring in a single cohort 39 .
Moving toward clinical application of wearable devices and walking speed measurement, it is important to consider in which specific real-world context wearable devices can quantify mobility most accurately and reliably.Our findings demonstrated that WBs > 30 s provide the most accurate and reliable measurement.WBs > 30 can be characterized as medium to long in their duration.Walking speed estimated from medium length WBs (between 30 and 60 s), may reflect activities of daily living, such as intermittent periods of shopping or undertaking other errands in public spaces outside the home 18,45 .In contrast, longer WBs (> 60 s), typically capture faster walking speeds that are closer to what is already being measured in the laboratory.Thus, walking speed measured in medium length WBs reflects a balance between capturing activities of daily living, and sufficient periods of straight walking activity that enable the robust quantification of walking speed.However, for certain patients walking continuously for 30 s as our cut-off suggests might already be strenuous.Thus, we would recommend using all (WBs > 10 s) to include a balance between capturing a sufficient number of WBs for patients www.nature.com/scientificreports/with a variety of condition severities, whilst ensuring walking speed can still be quantified reliably.Future clinical validation studies with a larger number of participants with severe gait impairments are required to confirm the reported error ranges for specific disease populations and can confirm the influence of WB duration upon the functional insight of mobility provided by walking speed 46 .

Walking speed
The influence of the speed at which each WB is completed upon the validity of the walking speed is considered a confounding factor of gait analysis 47 .When exploring the average walking speed across all WBs, we found that walking speed in WBs undertaken at slower speeds (< 0.6 m/s) tended to be overestimated by ≤ 0.8 m/s.The wearable device and WBs with faster speeds were underestimated, where moderate walking speeds provided the highest accuracy (Fig. 2).Longer WBs (> 60 s) were completed at faster walking speed in comparison to medium length WBs, which were undertaken at moderate speeds.Further, the overall number of slow WBs appears to be smaller for the wearable device.The speed distribution of only the true-positive WBs exhibits a similar shift in distributions but at overall higher speeds (Supplementary Fig. 2).In conjunction with the presented error values, this suggests that slow WBs are detected correctly, but their speed values are overestimated.Notably, these lower speeds were predominantly observed in short WBs.Consequently, this further justifies our recommendation that medium-length WBs provide the right balance between functional relevance and accuracy.Exploration of the clinical properties of walking speed encapsulated within these WBs, will become a topic of research and further investigated in on-going clinical validation efforts 46 .Furthermore, measurements with cohorts consisting of predominantly slow walkers will likely result in larger error ranges.This is consistent with previous research that also validated algorithms based upon a single lower back sensor 33,48 .
The algorithms used in this study were optimized and developed based on independent datasets to avoid bias.We foresee that future algorithms developed on the TVS dataset, and other similar real-world datasets can improve on the speed dependency observed here.

Real-world complexity
Aside from the WB duration, accuracy of walking speed estimation was also dependent upon the complexity of tasks/activities.The influence of complexity was cohort dependent and had the largest influence upon error for the MS, PD and PFF cohorts (estimated from P1 pipeline).We would expect those cohorts to experience more gait impairments than the CHF, COPD and HA cohorts (estimated from the P2 pipeline).However, whether the observed effect is caused by specific gait properties of the respective cohorts or by shortcomings in the selected algorithms, cannot be concluded based on the performed analysis.
Despite the challenges posed by outdoor environments (changes in terrain, weather and traffic negotiation (humans and vehicles)) 49,50 , outdoor environments capture more prolonged and uninterrupted walks in comparison to indoor environments, such as the household, which represent more confined and cluttered spaces with limited capacity for completing sequences of straight walking.Thus, we would expect error ranges from long uninterrupted outdoor walks are expected to be lower than results from confined indoor environments.Therefore, the combination of gait parameters with further contextual information might help to take this into account during data interpretation.

Limitations
While this study is one of the largest and most comprehensive validation studies for gait analysis based on wearable devices to date, the analysis of specific subgroup effects would require larger sample sizes.Potential links between the error, the condition severity and other medical comorbidities could not be established.Furthermore, the effect of walking aid use on results has not been assessed in this study.Future studies with more variable condition severity are needed to explore the influence of walking aid usage upon the validity of the analytical pipelines.
The real-world data was limited to 2.5 h for technical reasons.However, we accept that a recording of this length may not be sufficient to capture all the variability and patterns that would be included in multiple days of consecutive assessment.Due to technical issues with the devices, we were unable to assess some participants, which reduced our dataset.Data was also collected during the COVID-19 pandemic which may have impacted on participants' activity.While the analytical pipeline offers several strengths, its combined implementation does have limitations.As previously stated, the analytical pipelines for the wearable device data have the tendency to split longer WBs into multiple individual WBs.Hence, future research should explore whether this is caused by limitations of the initial contact and Left-Right detections, and how specific real-world contexts may influence walking speed performance.We found that error in walking speed estimation was more dependent upon stride length (spatial) estimation (MARE across all cohorts; laboratory = 14.31% and real-world 20.35%), however cadence (temporal) can be estimated with substantially lower errors (MARE across all cohorts; laboratory = 4.1% and real-world 4.8%) (Supplementary Tables 1 and 2).Therefore, future researchers looking to further improve performance of walking speed estimation, should target optimization of spatial algorithms.

Conclusion
Through the extensive real-world and laboratory validation across multiple cohorts, this study represents, to the best of our knowledge, the most accurate estimate of the expected error ranges of a lower-back wearable device for estimation of walking speed.The presented state-of-the-art algorithms pipelines could reliably estimate DMOs across a wide range of scenarios, providing a solid foundation for future studies to establish their clinical meaningfulness 46 .While complex setups like camera-based motion capture systems in the laboratory and wearable multi-modal sensor system in real-world scenarios still provide superior performance and might be required www.nature.com/scientificreports/for certain types of clinical analysis, we demonstrated the suitability of a single easy-to-use and inexpensive wearable device for movement monitoring across a wide range of clinical indications.This has the potential to make gait related parameters from long-term real world recordings ubiquitously available for clinical decision making.Our results showed that various parameters can influence DMO performance and multi-faceted analysis is crucial for understanding of the capabilities of any DMO pipeline.This motivates the capture of additional context information during real-world measurements to focus analysis on signal areas where high reliability can be expected.Furthermore, we identified clear areas where future algorithm pipelines can still improve, and we believe that the captured dataset will be vital for the development of future algorithms specifically targeting the challenges of unsupervised real-world recordings.

Protocol
The protocol has been extensively detailed in 24 .Participants were assessed in the laboratory and during a 2.5-h real-world observation.Mobility data was collected with a wearable device (McRoberts Dynaport MM+ , sampling frequency: 100 Hz, triaxial acceleration range: ± 8 g/resolution: 1 mg, triaxial gyroscope range: ± 2000 degrees per second (dps)/ resolution: 70 mdps), secured at the lower back with a Velcro belt.Participants were also asked to wear a multisensor INDIP reference system (sampling frequency: 100 Hz) 24,30 .Specifically, two magneto-IMUs were positioned over the instep and fixed to shoelaces with clips, and a third IMU was attached to the lower back with Velcro.Distance sensors were then positioned asymmetrically with Velcro (one above left ankle and another 3 cm higher on the right leg).Pressure insoles were selected for each participant's foot size and inserted into the shoe.The INDIP system has been validated in previous studies across a range of conditions and in this TVS cohorts also, showing excellent results and reliability in the qualification of mobility outcomes (MAE laboratory ≤ 0.02 m/s, simulated daily activities = 0.03 to 0.05 m/s), a complete overview of the validation results can be found in 31 .The INDIP and the wearable device were synchronized using their timestamps (± 10 ms).
Participants only performed tasks that they felt comfortable and safe to do in both protocols.

Laboratory protocol
Participants were asked to complete seven motor tasks with increasing complexity: Straight walking (slow, normal and fast speed), Timed Up and Go, L-Test, Surface Test, Hallway Test and Simulated Daily Activities.Each task was designed to capture and assess various elements associated with real-world walking including a range of walking speeds, incline/steps, surface, path shape, turns and specific motor tasks to simulate typical real-world transitions 24,39 .

Real-world protocol
Participants were assessed for up to 2.5 h in the real-world, as they went about their normal activities unsupervised (home/work/community/outdoor).The duration of the observation has been established as a trade-off between experimental, clinical, and technical requirements.To capture the largest possible range of activities during this assessment, participants were guided by the following list of activities: if relevant for their chosen environment, rise from a chair and walk to another room; walk to the kitchen and make a drink; walk up and down a set of stairs (if possible); walk outdoors (if possible, for a minimum of 2 min); if walking outside, walk up and down an inclined path.We did not provide supervision or structure on how these tasks should be completed to the participants 24 .

Calculation of walking speed
The evaluation of walking speed requires the combination of various algorithmic steps, including the identification of gait sequences and of initial contacts, estimation of DMOs, i.e., cadence and stride length.Selection of the top-ranked algorithms to detect gait-sequences, estimate initial contact events, cadence and stride length within identified gait-sequences was determined in our previous work 26 (Fig. 1).The best performing algorithm was then used to estimate walking speed using the outputs of the stride length and cadence algorithms using Eq.(1): Two independent analytical pipelines (P1 and P2) were identified in this process due to differences in algorithm selection for gait-sequence detection and cadence for the different conditions included in the study 26 .P1 provides the optimal combination of algorithms selected for HA, COPD, and CHF conditions, and P2 provides the optimal combination for PD, MS, and PFF (Fig. 7).
(1) Walking speed[m/s] = cadence step/min /(2 * 60) * stride length [m]  real world walking based on the percentage of a WB that was assessed to be a turn.Based on this we defined the following levels of gait complexity: (v) "simple" straight gait (< 20% covered by turns) and (vi) "complex gait" (> = 60% covered by turns).

Influence of walking speed and walking bout duration
Given the impact of real-world WB durations and speeds 44 on the adopted biomechanical strategies 55 , we analyzed their influence on the validity of the walking speed.For this, we assessed whether validity of walking speed estimation differed within specific WB durations bins (< 10 s, > 10 s, 10-30 s, 30-60 s, > 60 s and > 120 s).This was first performed for all true positive WBs comparing their errors across each WB threshold, and subsequently repeated for the combined analysis, by calculating the median walking speed for each participant within the respective speed bout and comparing the median values between the reference system and the wearable device.All these analyses permitted the validation of the quantification of walking speed across different walking strategies.

Validation measures
For all types of evaluations (all available WBs/aggregated values or on the respective subgroups), we calculated various statistical/comparison measures to quantify the walking speed estimation error for the sensitivity analysis: • Intraclass Coefficient (ICC (2,1) ) 56 was calculated to assess the association between the DMOs of the two systems.Based on ICC estimates, values < 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and > 0.90 were deemed to be indicative of poor, moderate, good, and excellent reliability, respectively 57 .• Absolute agreement was assessed by quantifying (i) the accuracy/mean absolute error (MAE), (ii) bias/mean error and (iii) precision/limits of agreements (LoA) 58 between walking speed estimates of both systems.• Mean relative errors (MRE) and mean absolute relative error (MARE) were estimated as the ratio between the (absolute) errors per WB and the corresponding estimates from the reference system, expressed as a percentage.

Figure 1 .
Figure 1.Overview of (a) the TVS protocol, (b) the analytical pipeline applied to estimate walking speed from the wearable device data (WD), (c) the approach to validating walking speed estimated from the analytical pipeline.

Figure 2 .
Figure 2. Residual plots of walking speed for all true-positive WBs recorded in the laboratory (left) and during the real-world recording (right).The margin plots represent the overall speed and error distributions.The margin plots are further grouped by the performed tests for the laboratory and by the cohort for the real-world recordings.The light blue bars around the Limits of Agreement (LOA) (dashed horizontal lines) represent their bootstrapped confidence intervals.The dashed black line represents the result of a linear regression on all datapoints.The grey area around the regression line represents the bootstrapped 95% confidence intervals.

Figure 3 .
Figure 3. Residual plots for the walking speed combined over all identified WBs.For the laboratory tests the median over all WBs within one motor task is taken (left).For the real-world recording the median over all WBs in the entire real-world assessment is shown (right), where each datapoint represents an individual participant.The margin plots represent the overall speed and error distributions.The margin plots are further grouped by the performed tests for the laboratory and by the cohort for the real-world recordings.The light blue bars around the Limits of Agreement (LOA) (dotted horizontal lines) represent their bootstrapped confidence intervals.The dashed black line represents the result of a linear regression on all datapoints.The grey area around the regression line represents the bootstrapped 95% confidence intervals.

Figure 4 .
Figure 4.The dependency of the absolute walking speed error of all true-positive WBs from the real-world recording on the WB duration reported by the reference system.In the top, WB errors are grouped by various duration bouts.In the bottom the number of bouts within each duration group is visualized.

Figure 5 .
Figure 5.The walking speed estimations from the real-world recording of the reference system and the wearable device, from all WB within the respective duration bouts.The boxplots show the distribution over all WBs.The bars in the upper plot show the absolute difference between the medians of the distributions (see right y-axis).The bottom plot shows the number of WBs in each duration bout.

Table 1 .
Demographic and clinical characteristics of the participants included in the real-world analysis.Values are presented as mean ± standard deviation.CAT chronic obstructive pulmonary disease (COPD) assessment test, EDSS expanded disability status scale, FEV1 forced expiratory volume in 1 second, KCCQ-12 Kansas City cardiomyopathy questionnaire-12, MDS-UPDRS III Movement disorder society unified Parkinson's disease rating scale part III, MoCA montreal cognitive assessment, SPPB Short physical performance battery, 6MWT 6 minute walking test, HA healthy adults, PD Parkinson's disease, MS multiple sclerosis, COPD chronic obstructive pulmonary disease, CHF congestive heart failure, PFF proximal femoral fracture.

Table 2 .
characterization of relative and absolute errors, Intraclass correlation coefficient (ICC), Limits of agreement (LoA), for walking speed estimated from the true-positive walking bouts (WBs) from all Laboratory tasks combined and the real-world assessment.Values are either provided as mean and [5%, 95%] quantile or as mean and limit of agreement, if indicated by LoA.

Table 5 .
Dependency on complexity for a selection of the gait tasks.The results are shown for each cohort, with limits of agreement (LoA).The "All" represents the statistics over all walking bouts (WBs) independent of the cohort.
CohortError with LOA (m/s) Rel.error with LOA (%) Abs error (m/s) Rel.abs.error (%) Vol:.(1234567890)Scientific Reports | (2024) 14:1754 | https://doi.org/10.1038/s41598-024-51766-5 24r the Mobilise-D technical validation study (TVS), participants were recruited from five clinical cohorts (CHF, COPD, MS, PD, and PFF) alongside HA.Participants were recruited at five sites: The Newcastle upon Tyne Hospitals NHS Foundation Trust, UK (Sponsor of the study) and Sheffield Teaching Hospitals NHS Foundation Trust, UK (ethics approval granted by London -Bloomsbury Ethics committee, 19/LO/1507); Tel Aviv Sourasky Medical Center, Israel (ethics approval granted by the Helsinki Committee, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel, 0551-19TLV), Robert Bosch Foundation for Medical Research, Germany (ethics approval granted by the ethical committee of the medical faculty of The University of Tübingen, 647/2019BO2), University of Kiel, Germany (ethics approval granted by the ethical committee of the medical faculty of Kiel University, D438/18).Informed consent was provided by all participants to take part in the study and all research was performed in accordance with the Declaration of Helsinki.Inclusion and exclusion criteria are fully described in24.